209 research outputs found

    P4-compatible High-level Synthesis of Low Latency 100 Gb/s Streaming Packet Parsers in FPGAs

    Full text link
    Packet parsing is a key step in SDN-aware devices. Packet parsers in SDN networks need to be both reconfigurable and fast, to support the evolving network protocols and the increasing multi-gigabit data rates. The combination of packet processing languages with FPGAs seems to be the perfect match for these requirements. In this work, we develop an open-source FPGA-based configurable architecture for arbitrary packet parsing to be used in SDN networks. We generate low latency and high-speed streaming packet parsers directly from a packet processing program. Our architecture is pipelined and entirely modeled using templated C++ classes. The pipeline layout is derived from a parser graph that corresponds a P4 code after a series of graph transformation rounds. The RTL code is generated from the C++ description using Xilinx Vivado HLS and synthesized with Xilinx Vivado. Our architecture achieves 100 Gb/s data rate in a Xilinx Virtex-7 FPGA while reducing the latency by 45% and the LUT usage by 40% compared to the state-of-the-art.Comment: Accepted for publication at the 26th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays February 25 - 27, 2018 Monterey Marriott Hotel, Monterey, California, 7 pages, 7 figures, 1 tabl

    PoET-BiN: Power Efficient Tiny Binary Neurons

    Full text link
    The success of neural networks in image classification has inspired various hardware implementations on embedded platforms such as Field Programmable Gate Arrays, embedded processors and Graphical Processing Units. These embedded platforms are constrained in terms of power, which is mainly consumed by the Multiply Accumulate operations and the memory accesses for weight fetching. Quantization and pruning have been proposed to address this issue. Though effective, these techniques do not take into account the underlying architecture of the embedded hardware. In this work, we propose PoET-BiN, a Look-Up Table based power efficient implementation on resource constrained embedded devices. A modified Decision Tree approach forms the backbone of the proposed implementation in the binary domain. A LUT access consumes far less power than the equivalent Multiply Accumulate operation it replaces, and the modified Decision Tree algorithm eliminates the need for memory accesses. We applied the PoET-BiN architecture to implement the classification layers of networks trained on MNIST, SVHN and CIFAR-10 datasets, with near state-of-the art results. The energy reduction for the classifier portion reaches up to six orders of magnitude compared to a floating point implementations and up to three orders of magnitude when compared to recent binary quantized neural networks.Comment: Accepted in MLSys 2020 conferenc

    CARLA: A Convolution Accelerator with a Reconfigurable and Low-Energy Architecture

    Full text link
    Convolutional Neural Networks (CNNs) have proven to be extremely accurate for image recognition, even outperforming human recognition capability. When deployed on battery-powered mobile devices, efficient computer architectures are required to enable fast and energy-efficient computation of costly convolution operations. Despite recent advances in hardware accelerator design for CNNs, two major problems have not yet been addressed effectively, particularly when the convolution layers have highly diverse structures: (1) minimizing energy-hungry off-chip DRAM data movements; (2) maximizing the utilization factor of processing resources to perform convolutions. This work thus proposes an energy-efficient architecture equipped with several optimized dataflows to support the structural diversity of modern CNNs. The proposed approach is evaluated by implementing convolutional layers of VGGNet-16 and ResNet-50. Results show that the architecture achieves a Processing Element (PE) utilization factor of 98% for the majority of 3x3 and 1x1 convolutional layers, while limiting latency to 396.9 ms and 92.7 ms when performing convolutional layers of VGGNet-16 and ResNet-50, respectively. In addition, the proposed architecture benefits from the structured sparsity in ResNet-50 to reduce the latency to 42.5 ms when half of the channels are pruned.Comment: 12 page

    Module-per-Object: a Human-Driven Methodology for C++-based High-Level Synthesis Design

    Full text link
    High-Level Synthesis (HLS) brings FPGAs to audiences previously unfamiliar to hardware design. However, achieving the highest Quality-of-Results (QoR) with HLS is still unattainable for most programmers. This requires detailed knowledge of FPGA architecture and hardware design in order to produce FPGA-friendly codes. Moreover, these codes are normally in conflict with best coding practices, which favor code reuse, modularity, and conciseness. To overcome these limitations, we propose Module-per-Object (MpO), a human-driven HLS design methodology intended for both hardware designers and software developers with limited FPGA expertise. MpO exploits modern C++ to raise the abstraction level while improving QoR, code readability and modularity. To guide HLS designers, we present the five characteristics of MpO classes. Each characteristic exploits the power of HLS-supported modern C++ features to build C++-based hardware modules. These characteristics lead to high-quality software descriptions and efficient hardware generation. We also present a use case of MpO, where we use C++ as the intermediate language for FPGA-targeted code generation from P4, a packet processing domain specific language. The MpO methodology is evaluated using three design experiments: a packet parser, a flow-based traffic manager, and a digital up-converter. Based on experiments, we show that MpO can be comparable to hand-written VHDL code while keeping a high abstraction level, human-readable coding style and modularity. Compared to traditional C-based HLS design, MpO leads to more efficient circuit generation, both in terms of performance and resource utilization. Also, the MpO approach notably improves software quality, augmenting parametrization while eliminating the incidence of code duplication.Comment: 9 pages. Paper accepted for publication at The 27th IEEE International Symposium on Field-Programmable Custom Computing Machines, San Diego CA, April 28 - May 1, 201

    Bridging the Gap: FPGAs as Programmable Switches

    Full text link
    The emergence of P4, a domain specific language, coupled to PISA, a domain specific architecture, is revolutionizing the networking field. P4 allows to describe how packets are processed by a programmable data plane, spanning ASICs and CPUs, implementing PISA. Because the processing flexibility can be limited on ASICs, while the CPUs performance for networking tasks lag behind, recent works have proposed to implement PISA on FPGAs. However, little effort has been dedicated to analyze whether FPGAs are good candidates to implement PISA. In this work, we take a step back and evaluate the micro-architecture efficiency of various PISA blocks. We demonstrate, supported by a theoretical and experimental analysis, that the performance of a few PISA blocks is severely limited by the current FPGA architectures. Specifically, we show that match tables and programmable packet schedulers represent the main performance bottlenecks for FPGA-based programmable switches. Thus, we explore two avenues to alleviate these shortcomings. First, we identify network applications well tailored to current FPGAs. Second, to support a wider range of networking applications, we propose modifications to the FPGA architectures which can also be of interest out of the networking field.Comment: To be published in : IEEE International Conference on High Performance Switching and Routing 202

    Models for the Brane-Bulk Interaction: Toward Understanding Braneworld Cosmological Perturbation

    Full text link
    Using some simple toy models, we explore the nature of the brane-bulk interaction for cosmological models with a large extra dimension. We are in particular interested in understanding the role of the bulk gravitons, which from the point of view of an observer on the brane will appear to generate dissipation and nonlocality, effects which cannot be incorporated into an effective (3+1)-dimensional Lagrangian field theoretic description. We explicitly work out the dynamics of several discrete systems consisting of a finite number of degrees of freedom on the boundary coupled to a (1+1)-dimensional field theory subject to a variety of wave equations. Systems both with and without time translation invariance are considered and moving boundaries are discussed as well. The models considered contain all the qualitative feature of quantized linearized cosmological perturbations for a Randall-Sundrum universe having an arbitrary expansion history, with the sole exception of gravitational gauge invariance, which will be treated in a later paper.Comment: 47 pages, RevTeX (or Latex, etc) with 5 eps figure

    Node configuration for the Aho-Corasick algorithm in intrusion detection systems

    Get PDF
    In this paper, we analyze the performance and cost trade-off from selecting two representations of nodes when implementing the Aho-Corasick algorithm. This algorithm can be used for pattern matching in network-based intrusion detection systems such as Snort. Our analysis uses the Snort 2.9.7 rules set, which contains almost 26k patterns. Our methodology consists of code profiling and analysis, followed by the selection of a parameter to maximize a metric that combines clock cycles count and memory usage. The parameter determines which of two types of nodes is selected for each trie node. We show that it is possible to select the parameter to optimize the metric, which results in an improvement by up to 12× compared with the single node-type case

    Alzheimer’s Prevention Initiative Generation Program: Development of an APOE genetic counseling and disclosure process in the context of clinical trials

    Full text link
    IntroductionAs the number of Alzheimer’s disease (AD) prevention studies grows, many individuals will need to learn their genetic and/or biomarker risk for the disease to determine trial eligibility. An alternative to traditional models of genetic counseling and disclosure is needed to provide comprehensive standardized counseling and disclosure of apolipoprotein E (APOE) results efficiently, safely, and effectively in the context of AD prevention trials.MethodsA multidisciplinary Genetic Testing, Counseling, and Disclosure Committee was established and charged with operationalizing the Alzheimer’s Prevention Initiative (API) Genetic Counseling and Disclosure Process for use in the API Generation Program trials. The objective was to provide consistent information to research participants before and during the APOE counseling and disclosure session using standardized educational and session materials.ResultsThe Genetic Testing, Counseling, and Disclosure Committee created a process consisting of eight components: requirements of APOE testing and reports, psychological readiness assessment, determination of AD risk estimates, guidance for identifying providers of disclosure, predisclosure education, APOE counseling and disclosure session materials, APOE counseling and disclosure session flow, and assessing APOE disclosure impact.DiscussionThe API Genetic Counseling and Disclosure Process provides a framework for largeâ scale disclosure of APOE genotype results to study participants and serves as a model for disclosure of biomarker results. The process provides education to participants about the meaning and implication(s) of their APOE results while also incorporating a comprehensive assessment of disclosure impact. Data assessing participant safety and psychological wellâ being before and after APOE disclosure are still being collected and will be presented in a future publication.Highlightsâ ¢Participants may need to learn their risk for Alzheimer’s disease to enroll in studies.â ¢Alternatives to traditional models of apolipoprotein E counseling and disclosure are needed.â ¢An alternative process was developed by the Alzheimer’s Prevention Initiative.â ¢This process has been implemented by the Alzheimer’s Prevention Initiative Generation Program.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/153071/1/trc2jtrci201909013.pd
    corecore